Auto-Differentiating Linear Algebra

نویسندگان

Matthias W. Seeger

Asmus Hetzel

Zhenwen Dai

Neil D. Lawrence

چکیده

Development systems for deep learning, such as Theano, Torch, TensorFlow, or MXNet, are easyto-use tools for creating complex neural network models. Since gradient computations are automatically baked in, and execution is mapped to high performance hardware, these models can be trained endto-end on large amounts of data. However, it is currently not easy to implement many basic machine learning primitives in these systems (such as Gaussian processes, least squares estimation, principal components analysis, Kalman smoothing), mainly because they lack efficient support of linear algebra primitives as differentiable operators. We detail how a number of matrix decompositions (Cholesky, LQ, symmetric eigen) can be implemented as differentiable operators. We have implemented these primitives in MXNet, running on CPU and GPU in single and double precision. We sketch use cases of these new operators, learning Gaussian process and Bayesian linear regression models. Our implementation is based on BLAS/LAPACK APIs, for which highly tuned implementations are available on all major CPUs and GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Generation of Tiled and Parallel Linear Algebra Routines A partitioning framework for the BTO Compiler

Exploiting parallelism in modern hardware is necessary to achieve high performance in linear algebra routines. Unfortunately, modern architectures are complex so many optimization choices must be considered to find the combination that delivers the best performance. Exploring optimizations by hand is costly and time consuming. Auto-tuning systems offer a method for quickly generating and evalua...

متن کامل

Auto-tuning Parallel Programs at Compiler- and Application-Levels

Auto-tuning has recently received its fair share of attention from the High Performance Computing community. Most auto-tuning approaches are specialized to work either on specific domains dense/sparse linear algebra, stencil computations etc.; or only at certain stages of program execution compile-time, launch-time or run-time. Real scientific applications, however, demand a cohesive environmen...

متن کامل

Auto-tuning of level 1 and level 2 BLAS for GPUs

The use of high performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto-tuning of level 1 and level 2 BLAS routines on GPUs. As examples, we develop single-precision CUDA kerne...

متن کامل

Auto-tuning a Matrix Routine for High Performance

Well-written scientific simulations typically get tremendous performance gains by using highly optimized library routines. Some of the most fundamental of these routines perform matrix-matrix multiplications and related routines, known as BLAS (Basic Linear Algebra Subprograms). Optimizing these library routines for efficiency is therefore of tremendous importance for many scientific simulation...

متن کامل

Auto-Optimization of Linear Algebra Parallel Routines: The Cholesky Factorization

c © 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher ment...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1710.08717 شماره

صفحات -

تاریخ انتشار 2017

Auto-Differentiating Linear Algebra

نویسندگان

چکیده

منابع مشابه

Automatic Generation of Tiled and Parallel Linear Algebra Routines A partitioning framework for the BTO Compiler

Auto-tuning Parallel Programs at Compiler- and Application-Levels

Auto-tuning of level 1 and level 2 BLAS for GPUs

Auto-tuning a Matrix Routine for High Performance

Auto-Optimization of Linear Algebra Parallel Routines: The Cholesky Factorization

عنوان ژورنال:

اشتراک گذاری